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Abstract 

The entropy of a hierarchical network topology in an ensemble of sparse random networks, with 
"hidden variables" associated to its nodes, is the log-likelihood that a given network topology is 
present in the chosen ensemble. We obtain a general formula for this entropy, which has a clear 
interpretation in some simple limiting cases. The results provide new keys with which to solve the 
general problem of "fitting" a given network with an appropriate ensemble of random networks. 
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I. INTRODUCTION 

The entropy is a key concept in information theory [l| and in the theory of dynamical 
systems In information theory, the problem of inference of a probability distribution 
on the basis of finite number of independent observations is usually addressed using the 
maximum likelihood principle or via the minimization of the Kullback-Leibler distance be- 
tween the given (empirical) distribution and the inferred one. Recently, several studies have 
extended the tools of information theory along these lines in order to measure the perfor- 
mance of filtering procedures of correlation matrices in the case of multivariate data 0, 4]. 
In the framework of graph theory the large deviations of the ensemble of random Erdos and 
Renyi graphs where derived by studying the free energies of statistical mechanics models 
defined on them . There is now increased interest, in the community of complex net- 
works , in the definition of entropy measures that are related with the networks' 



topological structure [1CJ or with diffusion processes defined on them [11]. The inference 
problem applied to complex networks can be formulated as the identification of the ensemble 
of networks which retains the essential structural characteristics and complexity of a given 
real network realization. The identification of this ensemble, is an active field of research. 
One aims to fit a given specific network with a suitable network ensemble that retains some 
information on its structure. Newman has proposed this approach to find the community 
structure in a given network Jj)]. Later, this method has been extended to define ensembles 
of networks that have other topological characteristics in common with the real network, 
such as the degree sequence and/or the degree correlations. As we add further features that 
a desired ensemble is to have in common with a given real network, we effectively consider 



ensembles with decreasing cardinality. The cardinality of an ensemble of networ 



given topology has attracted the attention of the graph theory community LUJ, ll5|, arid 



is with a 



io|. 



more recently also of the statistical mechanics community 

In this paper we evaluate the entropy of a given hierarchical topology in a "canonical" 
or "hidden variable" ensemble, i.e. we calculate the normalized logarithm of the probability 
that a given topology appears in this ensemble. By hierarchical topology we will mean the 
set of the generalized degrees of the nodes, defined as the sequence ki = (kj, kf,..., kf) of 
the number of nodes at distance 1,2, ... ,L from the node i. The "canonical" or "hidden 
variable" 

HI Q Q, q, y ensembles are generalization of the G(N,p) ensemble for het- 



2 



erogeneous nodes. The hetereogeneity of the nodes is described in terms of some "hidden 
variables" Xi, defined on each node i of the network, and the probability pij of a link between 
a node % and a node j is not p as in G(N,p) but it is a general function Q(xi,Xj) of the 
hidden variables at i and j nodes. These ensembles correspond to networks which satisfy 
soft constraints, for example the degree of a node is not fixed, but only the average degree 
of each node is fixed, allowing for Poissonian fluctuations. 

We derive a general formula for the entropy of a given topology in a " canonical" ensemble 
using ideas and methods from the study of diluted combinatorial optimization problems and 



statistical mechanica 
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systems on sparse networks [21 
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411 ]. In the simple case where we study the likelihood of 



a degree distribution of a network belonging to the chosen ensemble the entropy is found 
to be the Kullback-Leibler distance between the probability distribution of the degrees and 
the expected probability of the typical topology of the network. 

The paper is structured as follows: in section II we introduce the definition of the problem, 
in section III we provide the asymptotic entropy expression of the network topology in a 
given ensemble, in section IV we study the form that the entropy takes in special and relevant 
cases, and the conclusions are presented in section V. 



II. FORMULATION OF THE PROBLEM AND DEFINITIONS 

To model the essential properties of a real network it is useful to think of it as an instance 
of an ensemble of networks. The ensemble can be either " microcanonical" or "canonical" 
depending on whether the networks in the ensemble are subject to hard or soft constraints. 
The main example of what we call a "microcanonical" ensemble is G(N,M) where the 
number of links is fixed to be exactly M, and the main example of "canonical" ensemble 
is G(N,p) in which only the average number of links (M) = pN(N — l)/2 is fixed. These 
ensembles can be generalized to ensembles of random graphs with a given degree sequence 
and with a given hidden variable distribution. In this paper we will calculate the entropy 
of a given network topology (defined in terms of its hierarchical structure) in a general 
"canonical" ensemble. This entropy is defined as the probability that the given network 
topology is found in the " canonical" network ensemble under consideration. 
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A. "Canonical" ensembles 



We consider networks characterized by N nodes (or 'sites') labeled i = 1, . . . , N, and a 
symmetric matrix c with entries G {0, 1} that specify whether (cy = 1) or not (cy = 
0) nodes % and j are connected. We choose Ca = for all i. We write the set of all 
such undirected networks as Q = {0, l}^^ -1 ). On this set Q we introduce the following 
probability measure, in order to define an ensemble {Q, W} of random networks: 

W(C\X) = Yl [^<2(^%)£cy,l + (1 - JjQiXuXjWcijfl (!) 
i<j 

The {xi} represent 'hidden variables', drawn for each site independently with statis- 
tics p(x) to be defined later, and the function Q(x, x') > is chosen such that 
Ylixx' p(x)p(x')Q(x, x') = 1. The latter condition ensures that asymptotically c represents 
the average connectivity, viz. linijv^oo^ -1 ^ . cy) = c. Note that throughout this paper 
the 'hidden variables' {xj} can be scalar, discrete or multidimensional. 



B. Hierarchical constraints topologies 

Next we introduce a hierarchy of single-site observables with the objective to characterize 
with increasing precision the local topology of a network c G Q. They can be interpreted as 
generalized degrees fcj(c) = (k\ (c), . . . , k^(c)) of individual nodes i: 

fci(c) = Y. r; J r ju ...r h , G {0,1,2,...,^} (2) 

h-k 

In the absence of local loops, kf(c) measures the size (measured in number of nodes) of the 
local environment of node i, at a distance of I links. However, in this tree the nodes are 
counted with a multiplicity equal to their number of descendants encountered; similarly, in 
the case of local loops, nodes that can be visited from site i via multiple routes of length 
< £ are counted with this multiplicity. Note that k}(c) = J2j c ij * s the ordinary degree of 
node i, and that (T5J) can also be written as 

k}{c)=Y,c lv k i l +1 (c) = J2c tJ k e J (c) (3) 

3 3 

By definition, if k}{c) = then jfcf(c) = for all t. It is now natural to characterize the global 
topology of a network c either by giving its iV generalized degree vectors {ki(c), . . . , k N (c)} 
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themselves, or by giving the collective generalized degree statistics, conditioned on the values 
of the hidden variables, i.e. 

1 N 

P(k\x,c) = P(k 1 ,...,k L \x,c) = -j^^J2 S k,ki(c) S ^ x ~ x ^ W 
We adopt the convention that always k = (k\, . . . , kjy) G 1N L , unless indicated otherwise. 



C. Entropy of a network contraint topology in a given ensemble 

Our goal is to quantify to what extent the above characterization of networks, by the 
generalized degrees {k} = {ki, . . . , k^} or by the degree statistics Pi(k), specifies their 
micro-structure. This can be measured by the effective number of networks in the ensemble 
{Qi W} that meet the relevant contraints, i.e. (apart from a constant) by the Boltzmann 
entropies: 

constrain degrees : = — log ^ W(c\x) fy,. ^_ (c) (5) 

ceg i 

constrain statistics : f^[P|a;] = — log ^ W(c|£c) Y\ 5 ^P(k\x) — P(k\x, c) (6) 



ceg k 



X 



1 ^ n^o/n \ £i<*fc,Jfe/b-2*] 



log H 6 [ P ( k \ 



x) — 



N ° ^ i 1 L Np(x) 



,Nn L [{k}\x] 



The larger . .], the larger the effective number of graphs with the imposed global topol- 
ogy, viz. {ki} or P(k\x), so the less specific is the proposed macroscopic topology charac- 
terization. We will find that generally • •] = O(N ) as N —>■ oo. The remainder of this 
paper deals with the calculation of ([5]) and (Q in the limit iV — > oo, and their dependence 
on the choices made for P(k\x) and the for ensemble characteristics as defined by p(x) and 
Q(x, x'). 

III. ASYMPTOTIC VALUES OF THE ENTROPY OF NETWORK TOPOLOGY 
IN A GIVEN ENSEMBLE 

A. Derivation of steepest descent extremization formulas 

Since the ensemble (JT|) is invariant under all node permutations, the difference between 
the two formulae fl5|6l) should reflect only the node permutation freedom that is present in 



(J6]) but absent from (jSJ). We evaluate (|5ll6l) by writing each Kronecker 5 and each 5-function 
in integral form. Upon defining the short-hands = 1 for all i, and uJi ■ ki = Yle=i 
expression ([3]) allows us to simplify the term 8^ ^.(c) to 

itjJi-ki 



S 



(2tt) 



We next define 



'=1 u i k j 



(7) 



(8) 



ceg 



and subsequently find our that our two entropies can be written in the form 



n 



duj,;e 



D[{u,k}\x\ 



(9) 



n L [p\ 



X 



k,x 

— lim log / TT 

k,x 

>< e /*n[ 

ftl .../VjV 



iiVP(fe|a:)P(fe|a;) 



2?r/iV 



E < 



Arni[{fe}|cc]-iEiP(feil^) 



k\...kj\[ 

dP(k\x)e lNAp ^ p ^- 
2vr/iVA . 

^ a ,. e i[W i -fe i -P(fe i |a; i )/p(a; i )] - 



(10) 



The core of the problem is apparently to calculate the function D[{u:, k}] in (JHJ), which 
involves the introduction of a measure W(uj, k, x\{u>, k}) = N^ 1 £\ $k k ^ x ~ x i\^[ UJ ~ U3 ^ : 



D[{u,k}\x] = H{\ -gi.r,,-, 

i<j 

= expj-ciV dxdx'Q(x,x') I du>du>' W(u), k, x\ . . .) 
J ' J - kk' 

xW(u',k',x'\ . . .) e-^tMK-i+^-i] _ x j + q{N°)} (11) 

We isolate W(. . . | . . .) via suitable integrations over 5-functions, using the functional measure 
{dW} = lim Aa ,_^ limA^o Yiu^J^^' x)Au>Axy/N/2ir], resulting in 
D[{u,k}) = | {dW ^ }e ^/-\^^E fe ^,fe,,) W (a,,fc,,) +O (iv0) 

\cN f dxdx'Q{x,x') J^dWdW' EjUfc' W / (CJ,fc,x)VK(a;',fc',x') 



xe 



xe" 



(12) 



Now only the last line contains microscopic variables, and it factorizes fully over the nodes of 
the network. Upon inserting ffl2l) into ([9]) and fflOl) this allows us to evaluate both expressions 
for iV — > oo via steepest descent integration over the distributions W(u>, k,x), leading to 

n L [{k}\x] = extr {w ^ii{W,W}} (13) 
Q L [P\x] = extr {wM ^ 2 [{W,W,P}] (14) 

with the functions 

= i I du)dx^W(u,k,x)W(uj,k,x) + <$>[{W}} 
J ~* k 

+ fdx p{x) y P(k\x) log r e il"-k-w(u,k,*)) (15) 
J j? J- 7r {2ir) L 

dudx fe > x)W{u>, k, x) + ${{W}} 

k 

+i / dx s ^P(k\x)P(k\x) 
J k 

+ fdx p(x) log r Y e iN-*-^*WM-)-^(«,M (i 6 ) 

J J-7T {^) £ 



where 



<$>[{W}\ = Jdxdx' Q(x,x') J dudu'y^W{u),k,x)W{uj\k\x') 



x 



kk 



(17) 



It will be convenient to introduce new functions Q(k\x) = exp[— iP{k\x)/p{x)\ and 
V(u>, k, x) = exp[— iW(oj, k, x)\ so that our saddle-point equations simplify to 

Q L [{k}\x] = extr^w^^W}} (18) 
Q L [P\x] = extT {Qym ^ 2 [{Q,V,W}} (19) 

with the functions 

dudx w ("i fe > x ) lo § fe > x ) 
k 

+ jdx p(x) P(k\*) log f J0rz V{u, k, x)e^ k (20) 



ty 2 [{Q,V,W}\ = $[{W}}- / dojdxJ2w(uj,k,x)\ogV(uj,k 

k 

+ J dx p(x) log 2_^Q(k\x) J ^(w.fejj 



(21) 



B. Simplification and reduction of the functional saddle-point equations 

We can now do the functional variations of . .] and ^[•••J and find our saddle- 
point equations from which to solve {Q,^,! 7 ! 7 }. For . .] (referring to ensembles with 
constrained generalized degrees) these are found to be the following: 

\ogV(u;,k,x) = c fdx'Q(x,x') j du' ^ W{uj\ k\ x) L^ELiK'v-i+'^-i] — lj (22) 

k' 



W(u}, k,x) 



p(x)P(k\x) V(uj,k,x)e 



w k 



(23) 



f^du' V(u', k, x)e lU >' k 

For ^2[- • •] (referring to ensembles with constrained distributions of generalized degrees) 
these are found to be the following: 

\ogV(u;,k,x) = c fdx'Q(x,x') j du' W ((*>', K x') L'^^-^'M _ lj (24) 

p(x)Q(k\x)V(u 1 k,x)e iUJ - k 



W((jJ, k,x) 
P(k\x) 



Efc' Q( k '\ x ) V{y>', k', x)e^'- k ' 

Q{k\x) p_du V{u>, k, x)e iU >- k 



The last equation is easily solved, viz. 

P(k\x) 



Q(k\x) 



(25) 
(26) 

(27) 



f^du V(uj,k,x)e iUJ - k 

whereas in both cases (constrained degrees versus constrained degree statistics) we can 
eliminate immediately the kernels W(oj, k, x), leaving us in either case with a closed problem 
for the kernel V(uj, k, x) only. Upon inserting the solution (1271) into (1261) one finds that this 
remaining problem is in fact identical for both types of constraints, namely 



3 k' 



/* du' V(w',k',x')e iUJ '' k ' [ e -<EiiN^W - 1 



x- 



jZju' V(u',k',x>)e iu >'- k ' 
In addition one finds that (123]) holds in both cases. The solution of (1281) is of the form 



(28) 



(29) 



where j(k, £, x) then obeys 



(30) 



x- 



f du) exp 


l Ef=i 


^(^ - 




f_duj exp 


w ■ k - 


^c^ evL j(k',$',x')e-^'' 





The two integrals over u; in the latter fraction can be done. Both are of the form 



X(fc, A;', x) = duj exp 



£ I IN*'. «",*'! 

^ GIN 



m! 

m>0 £™^„ jL n=l 



(31) 



and hence the equation for 7(fc, x) becomes 

/L-l 
dx'p{x')Q{x, x') ^ P(k'\x') Y[ <%+i,Ar 
k' e=1 



(32) 



X 



Z_/m>0 m! ■^^ 1 ...^ m e]N L 




=1 7(fc',rv)" 


nLi <^-i+£ n < m s? 











/k' L 1 

dx'p(x')Q(x, x') ^2 —P(k'\x') Yl %+i,fe^ 



x ■ 











nnli7(fc',r,^ 





(33) 



where we use the conventions that |n™ =1 M n ] m= o = 1, E™ 1 w n ] m =o = 0, and 
Efo...f m M (£i> • • • ' Cm)]m=o = 1- If X = 1 we have =-> fc and £ — > 1, so 7(fc, £, x) -> j(k, x). 
This describes the situation where the degrees are not generalized, but measure as usual only 
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the number of direct links per node. Here our equation for 7(. . .) simplifies drastically to 
L = 1 : 7 (*, x)= f dx'p(x')Q(x, x') £ k' (34) 

./ fe' C7( i K,XJ 

The right-hand side is clearly independent of fc, so 7(fc,x) = 7(2;) with 

7(2) = ! dx' V {x') Q ^ X ; X ') Y kP(k\x') (35) 
y cry (a?) ^ 

If L > 1 we can manipulate at most some further Kronecker Ss, and the final form is therefore 
7 (fc, t x) = / dx'p(x')Q(x, x') Pfa, • • • , a, k'\x') (36) 

^ k'>0 







,nrv)" 


4',fc i _ 1 +E„ <52 5E rifci 1 %+i-^-i,E„< 52 €? 






nti7((6,... 




x') 


5 fc'.En<f 2 CE nil 1 <W,E„<e 2 «? 



C. Simplification of the asymptotic entropy formulas 

At this stage we insert our previous results for the kernels {V, W, Q} into fll8|19|20|[2T]) to 
arrive at more explicit expressions for the asymptotic entropies, which will only involve the 
function j(k, £, x) of (1361) . The first step is to substitute expression (J27J) into (I2i~j) . This leads, 
in combination with the fact that at the relevant saddle-points the kernels { V, W} obey iden- 
tical equations for the two cases (constrained generalized degrees versus constrained statistics 
of generalized degrees), to the simple and natural relation between our two entropies: 

lim Q L [P\x] = lim Q L [{k}\x] - [ dx p(x) Y P(k\x) log P(k\x) (37) 

J k 

The extra freedom to construct microscopic network realizations in the case where we only 
constrain the generalized degree distribution, as opposed to constraining the actual values 
of the generalized degrees, is measured by the Shannon entropy of the imposed distribution 
P(Jfe|z). 

The relation (|37|) could be also derived from the definition of Q[. . .], given in (j^D-©- In 
fact we can observe that the probability VK(c|sg) present in the definition ([5D of fi[{fc}|cc] 
is invariant under all permutations of the labels of those nodes that have the same "hidden 
variable" x; this follows directly from definition ([1]). Consequently fiz,[{fc}|a;] must also be 
invariant under any permutation of the labels of nodes with same value of x. It follows that 
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f2i[{fc}|a;] is dependent on the degree sequence {k} only through the distributions {P(k\x)}. 
Therefore, we can use this simple insight to predict the relation between r2^[{fc}|a;] and 
f2i[P|£c] (T5TT) . In fact, because must be only dependent on the distribution P(k\x), 

we have that 



Iloge^^n^ 



Np(o 
[Np(x))\ 



D NQ L [{k}\X] 



[Np{x)P{k\x)]V 



(38) 



where {k} is any generalized degree sequence with degree distributions P(k\x). Using fl38|) 
we can derive relation (|37|) . 

In order to evaluate liniAr^oo fi£,[{fe}|ac] we only need to express liniAr^oo fii[{fc}|a?] in 
terms of the function 7(fe, £, x). We first note that at the relevant saddle-point the function 
®[{W}} (HTD takes the value 



$[{W}} = - fdx f du>^2w(u,k,x)logV(u,k,x) 
Insertion into (120]) . followed by elimination of W(uj, k,x) via (123]) . leads us to 



(39) 



^77 J 

Jim Q L [{k}\x] = I dx p(x)J2 P ( k \ x ) lo S / V(u,k,x)e iW - k 

k 



N^oo 



~ J dx p(x)^P(k 



x 



-* (27T) 

j*du e lU; k V (u;, k, x) log V(u>, k, x) 



(40) 



where Vq(uj, k,x) = V(u), k,x) exp[c J dx'Q(x, x')p(x')]. The final step is the elimination of 
V(uj,k,x) via fl29|) . followed by integration over u>, using the property that j(k,£,x) = 
unless £i = 1: 



ff du 



v^k,xy»* = y. c Z E n h(M>)k En<ro £ 



ff duj 



V {oj, k, x) log V (oj, k, x)e 



6 k,0 
u> k 



m>0 £± ^ 



1 t m n<m 



E nb(fc^v)j^ i£ <4i) 



1 n<fci 



Era E n|^.r 

^1 £m n 



(m-1)! f 

m>0 v t 1 t m n<m 



(42) 
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So one arrives at the compact result, where we have used the fact that if k± = then kg = 
for all i (which follows from the definition of the generalized degrees) : 

\*mJl L \{k\\x\ = ^PikJlogiTcikJ + ^c- jdxp(x)Y^hP(k\x)} (43) 
fci & 

with the average-c Poissonian degree distribution ir c (k) = c k e~ c /k\. 



IV. APPLICATIONS OF THE GENERAL THEORY 



A. Regular random graphs 

Our first application domain is that of r-regular degree distribution P{k\x) = <^.fc(r)' 
with k{r) = (r, r 2 , . . . , r L ). Here one can solve (1361) explicitly: 



7 (*,€,a:) = ^ [dx'p(x')Q(x,x')J2* 

' k'>0 



(44) 





n^7(fc(r),r,^ 




St 1 


nLi7(fc(r),r,^ 





The solution is seen to be of the form j(k,£,x) = •y(k,x)5^ ^ lrr2 r L-iy and independent 
of k^. Insertion of this form into the above equation then gives 

a nti 1 S k 



T f 

7(fc, x) = - dx'p(x')Q(x,x' 



if 1 



j(k(r), x') 



We conclude that 



L-l 



l(k, £ , x) = j(x) y Y[ $k e y\ [ IJ 

where j(x) is the solution of 

1 (x) = - I dx'p(x') ^-^- 
c J -y{x') 



(45) 



(46) 



(47) 



For the entropies ( 13711431) one then finds 

lim fii[P|a;] = lim fi^[{fc}|a;] = log7r c (r) + r / dx p{x) log 7(0;) H — (c — r) (48) 

N^oo N^oo J 2 
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As expected, the two entropies are identical (since for regular graphs there is no entropy 
contribution from degree permutations) and independent of L (since upon specifying that 
the degrees are r-regular, the full distributions P(k\x) are uniquely specified for any L). In 
the special case Q(x,x') = 1 of uncorrelated degrees the above solution simplifies further. 

Now 7(2;) = a/t/c, and 

1 1 

lim fii[P|ac|x] = lim f2x[{fc}|cc] = log7r c (r) + -r log (r/c) + ~(c — r) (49) 

N— >oo N^oo 2 2 



B. The case L = 1 



For L = 1 we have already simplified our formula for the function j(k,£,x) to relation 
fl35|) for a simple function 7(2). We can do the same for expression (j43p for the entropy, 
which gives 

lim fii[{ife}|ic] = VP(A;)log7r c (A;) + ^(c-fc)+ [ dx p(x)J" P(k\x)\og-f\x) (50) 
fc 17 k 

lim fii[P|x] = r?(k) log n c (k) + hc-k) - [dx p(x)J" P(k\x)\og[P(k\x)/-f k (x)}(5l) 

fc 17 fc 

with 7r c (/c) = c k e~ c /k\, with k = fdx p(x) J2 k kP(k\x), P{k) = f dxp(x)P(k\x) and where 
7(2) is to be solved from 



7(2) 



/^^E^l^') (52) 

We see immediately that for Q(x,x') = 1 (the Erdos-Renyi ensemble), and upon choosing 
P(k\x) = P(k) (since for Q(x,x') = 1 the hidden variables x are obsolete) we would have 
had 7(2) = \J k/c V2. Expression (150]) now becomes 

\im N ^ oo n 1 [{k}\x] = Z k P(k) \og7r c (k) + i(c-fc) + §Hog(fc/c) 
Q(x,x') = l: _ (53) 

lim^^PI*] = -£ fc P(fc)log[P(fc)/7r c (A;)] + \{c-k) + §fclog(A;/c) 

So, if one also chooses k = c, the entropy of networks with degree distribution P(k) in the 
Erdos-Renyi ensemble is minus the Kullback-Leibler distance between P(k) and a Poisson 
degree distribution, provided the ensemble and P(k) have the same average connectivity. 
An alternative derivation of equation ( |53i) can also be obtained starting from the expression 
of the total number of graphs with given degree sequence A/"[{fc}] derived in 10, Q, Q: 

Af[{k}] = (kN - (54) 
13 



with A = (k 2 /k^ — 1 and k 2 = ^\ kf/N. The entropy of the degree sequence {k} 

in the Erdos-Renyi ensemble, is the logarithm of the probability of having one of the total 
number A/"[{/c}] of possible networks in the ensemble. Since in a Erdos-Renyi network each 
link has a probability c/N to be present, we have 

1 f /C\ N */ 2 / r \N{N-l)/2-Nk/2} 

Km «,[{*>] =Hm -log {MOT] (-) (l - -) }. (55) 

Upon inserting the expression of A/"[{fc}], ( l54l) in Eq. ( |55|) we recover ( |53l) . 

The other terms in (1501) apparently represent the effect of average connectivity mismatches 
and of the degree correlations induced by Q(., .), and make matters more complicated. The 
simple form of our L = 1 equations, however, still allows us to push the analysis further for 
certain cases, by solving 7(2;) explicitly from equation (1521) . For instance, if the (symmetric) 



kernel Q(x, x') has an eigenfunction f(x) = yjp(x)k{x), with k(x) = Y2k kP(k\x) then 

J dx'Q(x, x')f(x') = Xf(x), f(x) = y/p(x)h{x) : 7 (x) = ^f(x) (56) 

If Q(.,.) has this property, together with the normalization jdxdx'p(x)Q(x,x')p(x') = 1, 
then one finds that the entropy ( |50j) becomes 

1 — 1 f 

lim fii[{/c}]|£c] = P(k) log 7i c (k) + ~{c-k) + - / dx p(x)k(x) \og[\p(x)k(x) / c] 
k J 

= 2jP(A;)log7r c (A;) + -(c-A;) + - / rfx p(x)k(x) log\p(x)k(x)} + -A;log[A/c] 
k J 

(57) 

(where k — Jdx p(x)k(x)). Let us next discuss some example kernels Q(x,x') for which 
7(2;) can be solved explicitly, either directly, or via the above procedure based on using 
eigenfunctions of Q(., .): 

• First example: 

Here we assume Q(x,x') to be such that the conditional connectivities k(x) = 
Y2k kP(k\x) are the typical ones for the ensemble ([I]), which implies that 

k(x) = c Jdx' Q(x,x')p(x') (58) 

and k = c. In this case ( )52|) has the solution 7(x) = k(x)/c, which leads to the 
following simple expression for the entropies: 

lim ^[{/cjla?] = I dx p{x)y P{k\x) log it k( x ) (59) 

J 7, 
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lim ni[P|aj] 

TV— >oo 



dx p(x) ^ P{k\%) ^og[P(k\x)/7i k{x )} 



(60) 



This indicates that in this case the entropy limjv-^oo ^i[-P|^] takes the form of an 
integral over p(x) of the Kullback-Leibler distance between the probabilities P(k\x) 
and the Possion distribution t^Mx)- We note that for the hidden variable model the 
typical degree distribution of the nodes with hidden variable x is indeed 7r fc ( x ) [201 ]. 



Second example: 

Q(x, x') = a + a\5{x — x'), k(x) = -c/p(x) (61) 

with x G [—1, 1]. Normalization of Q(., .) tells us that a$ = 1 — a\ J dx p 2 (x), and we 
need < a x < [Jdx p 2 (x)]^ 1 to ensure non-negative bond probabilities in our network 
ensemble. The networks in this ensemble have a non trivial community structure. In 
fact nodes with same hidden-variable have a larger probability to be connected. Here 
one finds a solution with k = c and ^{x) = 7, where 



7 



1 — di J dx p 2 (x) + -di 



1 r /" 1 

lim f2i[{/c}|cc] = > P(k) log ir c (k) + -clog 1 — a x / dx p 2 (x) + -ai 
N— >oo — ' 2 L / 1 2 



(62) 
(63) 



Third example: 



g{x) +g{x r ) 



k(x) 



(64) 



2 J 'dx"p(x")g(x"Y p{x) 
with x G [—1, 1], with the short-hand (0)o = \ f^dx 4>(x), and with g(x) > for all 
x G [— 1, 1]. Here one finds the solution 



7(2;) 



(9)0+ VWVo 

^c\J Jdx' p(x')g(x') 



x 



J dx p(x)g(x) 



(65) 



C. The case L 



Here we have to find first the solution of fl36l). which now reduces to 



l((k 1 ,k 2 ),(l,£),x) = - fdx'p(x')Q(x,x') ^P(£,k'\x' 

k>>0 



X ■ 









(Un),0 








=i7((^') 


(Un),0 





(66) 
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We observe that the right-hand side is independent of k 2 , so the solution of our equation 
must have the form j((ki, k 2 ), (!,£),#) = j(ki, £, x), where 



l(k,t,x) = £ fdx'p{x')Q{x,x')Y J P{^k , W) 

c J k>>0 



X- 







: 1 i7&e»,*')' 




^Ci-Ce 


ni 


=i 7(^^,3;') 





(67) 



The entropy would become 

lim Q 2 [{k}\x] = VP(A; 1 )log7r c (A; 1 ) + i[c- f dx p(x)Y j k 1 P(k 1 ,k 2 



|.X') 



k 1 k 2 



+ 



(68) 



Let us limit ourselves to the simplest scenario where there are no degree correlations, i.e. 
Q(x,x') = 1. Here we have 7(/c, £, x) = 7(fc, £), and we need only the generalized degree 
statistics P(ki, k 2 ) = fdx p(x)P(ki, k 2 \x). Our formulae thereby reduce to 



Z)ci-Ce-i I nl=i 7(f » 60 I 5 fe',fc+E„<s e 



k'>0 



lim fi 2 [{fc}|x] = VP(A; 1 )log7r c (A; 1 ) + i[c- V fciP(fci, fc 2 )l 

fel klk2 



(69) 
(70) 



+ 



k\k 2 



Here one observes the validity of the following simple relation: 

5>(*i,*2)7(k,*i) = 
z — ' c 

fe 2 



(71) 



V. CONCLUSIONS 

In conclusion, we have calculated the entropies fi L [{fc}|£c] and Ql[P|:e] of hierarchical 
constrained network topologies in the "canonical" ensemble of large sparse networks de- 
scribed in terms of "hidden variables". 

The expression of the entropy fix,[P|x] assumes a very clear form in the case in which 
the network topology under study is the degree distribution of a network of the ensemble. 
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Here the entropy measures the large deviation of the topology of the given networks from 
the typical topology of networks in the chosen ensemble. 

The entropy measures the likelihood that a particular network topology belongs to an 
ensemble, as such it is an important quantity whenever one seeks to represent or characterize 
observed networks in terms of appropriate random network ensembles. We therefore believe 
that it may have many applications in the future in the context of community detection 
problems as well as other inference problems on complex networks. 
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